Search Results

Documents authored by Inenaga, Shunsuke


Document
Computing Palindromes on a Trie in Linear Time

Authors: Takuya Mieno, Mitsuru Funakoshi, and Shunsuke Inenaga

Published in: LIPIcs, Volume 248, 33rd International Symposium on Algorithms and Computation (ISAAC 2022)


Abstract
A trie 𝒯 is a rooted tree such that each edge is labeled by a single character from the alphabet, and the labels of out-going edges from the same node are mutually distinct. Given a trie 𝒯 with n edges, we show how to compute all distinct palindromes and all maximal palindromes on 𝒯 in O(n) time, in the case of integer alphabets of size polynomial in n. This improves the state-of-the-art O(n log h)-time algorithms by Funakoshi et al. [PSC 2019], where h is the height of 𝒯. Using our new algorithms, the eertree with suffix links for a given trie 𝒯 can readily be obtained in O(n) time. Further, our trie-based O(n)-space data structure allows us to report all distinct palindromes and maximal palindromes in a query string represented in the trie 𝒯, in output optimal time. This is an improvement over an existing (naïve) solution that precomputes and stores all distinct palindromes and maximal palindromes for each and every string in the trie 𝒯 separately, using a total O(n²) preprocessing time and space, and reports them in output optimal time upon query.

Cite as

Takuya Mieno, Mitsuru Funakoshi, and Shunsuke Inenaga. Computing Palindromes on a Trie in Linear Time. In 33rd International Symposium on Algorithms and Computation (ISAAC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 248, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.ISAAC.2022.15,
  author =	{Mieno, Takuya and Funakoshi, Mitsuru and Inenaga, Shunsuke},
  title =	{{Computing Palindromes on a Trie in Linear Time}},
  booktitle =	{33rd International Symposium on Algorithms and Computation (ISAAC 2022)},
  pages =	{15:1--15:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-258-7},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{248},
  editor =	{Bae, Sang Won and Park, Heejin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2022.15},
  URN =		{urn:nbn:de:0030-drops-173006},
  doi =		{10.4230/LIPIcs.ISAAC.2022.15},
  annote =	{Keywords: palindromes, suffix trees, tries, labeled trees, eertrees}
}
Document
Cartesian Tree Subsequence Matching

Authors: Tsubasa Oizumi, Takeshi Kai, Takuya Mieno, Shunsuke Inenaga, and Hiroki Arimura

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)


Abstract
Park et al. [TCS 2020] observed that the similarity between two (numerical) strings can be captured by the Cartesian trees: The Cartesian tree of a string is a binary tree recursively constructed by picking up the smallest value of the string as the root of the tree. Two strings of equal length are said to Cartesian-tree match if their Cartesian trees are isomorphic. Park et al. [TCS 2020] introduced the following Cartesian tree substring matching (CTMStr) problem: Given a text string T of length n and a pattern string of length m, find every consecutive substring S = T[i..j] of a text string T such that S and P Cartesian-tree match. They showed how to solve this problem in Õ(n+m) time. In this paper, we introduce the Cartesian tree subsequence matching (CTMSeq) problem, that asks to find every minimal substring S = T[i..j] of T such that S contains a subsequence S' which Cartesian-tree matches P. We prove that the CTMSeq problem can be solved efficiently, in O(m n p(n)) time, where p(n) denotes the update/query time for dynamic predecessor queries. By using a suitable dynamic predecessor data structure, we obtain O(mn log log n)-time and O(n log m)-space solution for CTMSeq. This contrasts CTMSeq with closely related order-preserving subsequence matching (OPMSeq) which was shown to be NP-hard by Bose et al. [IPL 1998].

Cite as

Tsubasa Oizumi, Takeshi Kai, Takuya Mieno, Shunsuke Inenaga, and Hiroki Arimura. Cartesian Tree Subsequence Matching. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{oizumi_et_al:LIPIcs.CPM.2022.14,
  author =	{Oizumi, Tsubasa and Kai, Takeshi and Mieno, Takuya and Inenaga, Shunsuke and Arimura, Hiroki},
  title =	{{Cartesian Tree Subsequence Matching}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.14},
  URN =		{urn:nbn:de:0030-drops-161414},
  doi =		{10.4230/LIPIcs.CPM.2022.14},
  annote =	{Keywords: string algorithms, pattern matching, Cartesian tree subsequence matching, order preserving matching, episode matching}
}
Document
{RePair} Grammars Are the Smallest Grammars for Fibonacci Words

Authors: Takuya Mieno, Shunsuke Inenaga, and Takashi Horiyama

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)


Abstract
Grammar-based compression is a loss-less data compression scheme that represents a given string w by a context-free grammar that generates only w. While computing the smallest grammar which generates a given string w is NP-hard in general, a number of polynomial-time grammar-based compressors which work well in practice have been proposed. RePair, proposed by Larsson and Moffat in 1999, is a grammar-based compressor which recursively replaces all possible occurrences of a most frequently occurring bigrams in the string. Since there can be multiple choices of the most frequent bigrams to replace, different implementations of RePair can result in different grammars. In this paper, we show that the smallest grammars generating the Fibonacci words F_k can be completely characterized by RePair, where F_k denotes the k-th Fibonacci word. Namely, all grammars for F_k generated by any implementation of RePair are the smallest grammars for F_k, and no other grammars can be the smallest for F_k. To the best of our knowledge, Fibonacci words are the first non-trivial infinite family of strings for which RePair is optimal.

Cite as

Takuya Mieno, Shunsuke Inenaga, and Takashi Horiyama. {RePair} Grammars Are the Smallest Grammars for Fibonacci Words. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 26:1-26:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.CPM.2022.26,
  author =	{Mieno, Takuya and Inenaga, Shunsuke and Horiyama, Takashi},
  title =	{{\{RePair\} Grammars Are the Smallest Grammars for Fibonacci Words}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{26:1--26:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.26},
  URN =		{urn:nbn:de:0030-drops-161530},
  doi =		{10.4230/LIPIcs.CPM.2022.26},
  annote =	{Keywords: grammar based compression, Fibonacci words, RePair, smallest grammar problem}
}
Document
Minimal Absent Words on Run-Length Encoded Strings

Authors: Tooru Akagi, Kouta Okabe, Takuya Mieno, Yuto Nakashima, and Shunsuke Inenaga

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)


Abstract
A string w is called a minimal absent word for another string T if w does not occur (as a substring) in T and all proper substrings of w occur in T. State-of-the-art data structures for reporting the set MAW(T) of MAWs from a given string T of length n require O(n) space, can be built in O(n) time, and can report all MAWs in O(|MAW(T)|) time upon a query. This paper initiates the problem of computing MAWs from a compressed representation of a string. In particular, we focus on the most basic compressed representation of a string, run-length encoding (RLE), which represents each maximal run of the same characters a by a^p where p is the length of the run. Let m be the RLE-size of string T. After categorizing the MAWs into five disjoint sets ℳ₁, ℳ₂, ℳ₃, ℳ₄, ℳ₅ using RLE, we present matching upper and lower bounds for the number of MAWs in ℳ_i for i = 1,2,4,5 in terms of RLE-size m, except for ℳ₃ whose size is unbounded by m. We then present a compact O(m)-space data structure that can report all MAWs in optimal O(|MAW(T)|) time.

Cite as

Tooru Akagi, Kouta Okabe, Takuya Mieno, Yuto Nakashima, and Shunsuke Inenaga. Minimal Absent Words on Run-Length Encoded Strings. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 27:1-27:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{akagi_et_al:LIPIcs.CPM.2022.27,
  author =	{Akagi, Tooru and Okabe, Kouta and Mieno, Takuya and Nakashima, Yuto and Inenaga, Shunsuke},
  title =	{{Minimal Absent Words on Run-Length Encoded Strings}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{27:1--27:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.27},
  URN =		{urn:nbn:de:0030-drops-161545},
  doi =		{10.4230/LIPIcs.CPM.2022.27},
  annote =	{Keywords: string algorithms, combinatorics on words, minimal absent words, run-length encoding}
}
Document
A Reduction of the Dynamic Time Warping Distance to the Longest Increasing Subsequence Length

Authors: Yoshifumi Sakai and Shunsuke Inenaga

Published in: LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)


Abstract
The similarity between a pair of time series, i.e., sequences of indexed values in time order, is often estimated by the dynamic time warping (DTW) distance, instead of any in the well-studied family of measures including the longest common subsequence (LCS) length and the edit distance. Although it may seem as if the DTW and the LCS(-like) measures are essentially different, we reveal that the DTW distance can be represented by the longest increasing subsequence (LIS) length of a sequence of integers, which is the LCS length between the integer sequence and itself sorted. For a given pair of time series of n integers between zero and c, we propose an integer sequence that represents any substring-substring DTW distance as its band-substring LIS length. The length of the produced integer sequence is O(c⁴ n²) or O(c² n²) depending on the variant of the DTW distance used, both of which can be translated to O(n²) for constant cost functions. To demonstrate that techniques developed under the LCS(-like) measures are directly applicable to analysis of time series via our reduction of DTW to LIS, we present time-efficient algorithms for DTW-related problems utilizing the semi-local sequence comparison technique developed for LCS-related problems.

Cite as

Yoshifumi Sakai and Shunsuke Inenaga. A Reduction of the Dynamic Time Warping Distance to the Longest Increasing Subsequence Length. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 6:1-6:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{sakai_et_al:LIPIcs.ISAAC.2020.6,
  author =	{Sakai, Yoshifumi and Inenaga, Shunsuke},
  title =	{{A Reduction of the Dynamic Time Warping Distance to the Longest Increasing Subsequence Length}},
  booktitle =	{31st International Symposium on Algorithms and Computation (ISAAC 2020)},
  pages =	{6:1--6:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-173-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{181},
  editor =	{Cao, Yixin and Cheng, Siu-Wing and Li, Minming},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.6},
  URN =		{urn:nbn:de:0030-drops-133508},
  doi =		{10.4230/LIPIcs.ISAAC.2020.6},
  annote =	{Keywords: algorithms, dynamic time warping distance, longest increasing subsequence, semi-local sequence comparison}
}
Document
Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Ayumi Shinohara

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)


Abstract
The equidistant subsequence pattern matching problem is considered. Given a pattern string P and a text string T, we say that P is an equidistant subsequence of T if P is a subsequence of the text such that consecutive symbols of P in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-parallel algorithms that yield o(n²) time algorithms for finding k-(sub-)cadences and equidistant subsequences. Furthermore, O(nlog² n) and O(nlog n) time algorithms, respectively for equidistant and Abelian equidistant matching for the case |P| = 3, are shown. The algorithms make use of a technique that was recently introduced which can efficiently compute convolutions with linear constraints.

Cite as

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Ayumi Shinohara. Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 12:1-12:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.CPM.2020.12,
  author =	{Funakoshi, Mitsuru and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki and Shinohara, Ayumi},
  title =	{{Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{12:1--12:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.12},
  URN =		{urn:nbn:de:0030-drops-121375},
  doi =		{10.4230/LIPIcs.CPM.2020.12},
  annote =	{Keywords: string algorithms, pattern matching, bit parallelism, subsequences, cadences}
}
Document
DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures

Authors: Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, and Masayuki Takeda

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)


Abstract
Two strings x and y over Σ ∪ Π of equal length are said to parameterized match (p-match) if there is a renaming bijection f:Σ ∪ Π → Σ ∪ Π that is identity on Σ and transforms x to y (or vice versa). The p-matching problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose parameterized suffix automata (p-suffix automata) and parameterized directed acyclic word graphs (PDAWGs) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have Θ(n²) nodes and edges but PDAWGs have only O(n) nodes and edges, where n is the length of an input string. We also give O(n |Π| log (|Π| + |Σ|))-time O(n)-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the parameterized suffix tree for the reversed string can also be built in the same time and space, in a right-to-left online manner.

Cite as

Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, and Masayuki Takeda. DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{nakashima_et_al:LIPIcs.CPM.2020.26,
  author =	{Nakashima, Katsuhito and Fujisato, Noriki and Hendrian, Diptarama and Nakashima, Yuto and Yoshinaka, Ryo and Inenaga, Shunsuke and Bannai, Hideo and Shinohara, Ayumi and Takeda, Masayuki},
  title =	{{DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{26:1--26:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.26},
  URN =		{urn:nbn:de:0030-drops-121512},
  doi =		{10.4230/LIPIcs.CPM.2020.26},
  annote =	{Keywords: parameterized matching, suffix trees, DAWGs, suffix automata}
}
Document
An Improved Data Structure for Left-Right Maximal Generic Words Problem

Authors: Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)


Abstract
For a set D of documents and a positive integer d, a string w is said to be d-left-right maximal, if (1) w occurs in at least d documents in D, and (2) any proper superstring of w occurs in less than d documents. The left-right-maximal generic words problem is, given a set D of documents, to preprocess D so that for any string p and for any positive integer d, all the superstrings of p that are d-left-right maximal can be answered quickly. In this paper, we present an O(n log m) space data structure (in words) which answers queries in O(|p| + o log log m) time, where n is the total length of documents in D, m is the number of documents in D and o is the number of outputs. Our solution improves the previous one by Nishimoto et al. (PSC 2015), which uses an O(n log n) space data structure answering queries in O(|p|+ r * log n + o * log^2 n) time, where r is the number of right-extensions q of p occurring in at least d documents such that any proper right extension of q occurs in less than d documents.

Cite as

Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. An Improved Data Structure for Left-Right Maximal Generic Words Problem. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 40:1-40:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{fujishige_et_al:LIPIcs.ISAAC.2019.40,
  author =	{Fujishige, Yuta and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{An Improved Data Structure for Left-Right Maximal Generic Words Problem}},
  booktitle =	{30th International Symposium on Algorithms and Computation (ISAAC 2019)},
  pages =	{40:1--40:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-130-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{149},
  editor =	{Lu, Pinyan and Zhang, Guochuan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.40},
  URN =		{urn:nbn:de:0030-drops-115366},
  doi =		{10.4230/LIPIcs.ISAAC.2019.40},
  annote =	{Keywords: generic words, suffix trees, string processing algorithms}
}
Document
Computing Runs on a Trie

Authors: Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with n edges, we show that the number of runs is less than n. We also show an O(n sqrt{log n}log log n) time and O(n) space algorithm for counting and finding the shallower endpoint of all runs. We further show an O(n log n) time and O(n) space algorithm for finding both endpoints of all runs. We also discuss how to improve the running time even more.

Cite as

Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing Runs on a Trie. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 23:1-23:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{sugahara_et_al:LIPIcs.CPM.2019.23,
  author =	{Sugahara, Ryo and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Computing Runs on a Trie}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{23:1--23:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.23},
  URN =		{urn:nbn:de:0030-drops-104943},
  doi =		{10.4230/LIPIcs.CPM.2019.23},
  annote =	{Keywords: runs, Lyndon words}
}
Document
Faster Queries for Longest Substring Palindrome After Block Edit

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. Manacher [J. ACM 1975] proposed a seminal algorithm that computes the longest substring palindromes (LSPals) of a given string in O(n) time, where n is the length of the string. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(l + log log n) time, after a substring in T is replaced by a string of arbitrary length l. This outperforms the query algorithm proposed in our previous work [CPM 2018] that uses O(l + log n) time for each query.

Cite as

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Queries for Longest Substring Palindrome After Block Edit. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 27:1-27:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.CPM.2019.27,
  author =	{Funakoshi, Mitsuru and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Faster Queries for Longest Substring Palindrome After Block Edit}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{27:1--27:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.27},
  URN =		{urn:nbn:de:0030-drops-104989},
  doi =		{10.4230/LIPIcs.CPM.2019.27},
  annote =	{Keywords: palindromes, string algorithm, periodicity}
}
Document
On the Size of Overlapping Lempel-Ziv and Lyndon Factorizations

Authors: Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
Lempel-Ziv (LZ) factorization and Lyndon factorization are well-known factorizations of strings. Recently, Kärkkäinen et al. studied the relation between the sizes of the two factorizations, and showed that the size of the Lyndon factorization is always smaller than twice the size of the non-overlapping LZ factorization [STACS 2017]. In this paper, we consider a similar problem for the overlapping version of the LZ factorization. Since the size of the overlapping LZ factorization is always smaller than the size of the non-overlapping LZ factorization and, in fact, can even be an O(log n) factor smaller, it is not immediately clear whether a similar bound as in previous work would hold. Nevertheless, in this paper, we prove that the size of the Lyndon factorization is always smaller than four times the size of the overlapping LZ factorization.

Cite as

Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. On the Size of Overlapping Lempel-Ziv and Lyndon Factorizations. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 29:1-29:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{urabe_et_al:LIPIcs.CPM.2019.29,
  author =	{Urabe, Yuki and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{On the Size of Overlapping Lempel-Ziv and Lyndon Factorizations}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{29:1--29:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.29},
  URN =		{urn:nbn:de:0030-drops-105008},
  doi =		{10.4230/LIPIcs.CPM.2019.29},
  annote =	{Keywords: Lyndon factorization, Lyndon words, Lempel-Ziv factorization}
}
Document
Online Algorithms for Constructing Linear-Size Suffix Trie

Authors: Diptarama Hendrian, Takuya Takagi, and Shunsuke Inenaga

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
The suffix trees are fundamental data structures for various kinds of string processing. The suffix tree of a string T of length n has O(n) nodes and edges, and the string label of each edge is encoded by a pair of positions in T. Thus, even after the tree is built, the input text T needs to be kept stored and random access to T is still needed. The linear-size suffix tries (LSTs), proposed by Crochemore et al. [Linear-size suffix tries, TCS 638:171-178, 2016], are a "stand-alone" alternative to the suffix trees. Namely, the LST of a string T of length n occupies O(n) total space, and supports pattern matching and other tasks in the same efficiency as the suffix tree without the need to store the input text T. Crochemore et al. proposed an offline algorithm which transforms the suffix tree of T into the LST of T in O(n log sigma) time and O(n) space, where sigma is the alphabet size. In this paper, we present two types of online algorithms which "directly" construct the LST, from right to left, and from left to right, without constructing the suffix tree as an intermediate structure. Both algorithms construct the LST incrementally when a new symbol is read, and do not access to the previously read symbols. The right-to-left construction algorithm works in O(n log sigma) time and O(n) space and the left-to-right construction algorithm works in O(n (log sigma + log n / log log n)) time and O(n) space. The main feature of our algorithms is that the input text does not need to be stored.

Cite as

Diptarama Hendrian, Takuya Takagi, and Shunsuke Inenaga. Online Algorithms for Constructing Linear-Size Suffix Trie. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 30:1-30:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{hendrian_et_al:LIPIcs.CPM.2019.30,
  author =	{Hendrian, Diptarama and Takagi, Takuya and Inenaga, Shunsuke},
  title =	{{Online Algorithms for Constructing Linear-Size Suffix Trie}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{30:1--30:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.30},
  URN =		{urn:nbn:de:0030-drops-105016},
  doi =		{10.4230/LIPIcs.CPM.2019.30},
  annote =	{Keywords: Indexing structure, Linear-size suffix trie, Online algorithm, Pattern Matching}
}
Document
Faster Online Elastic Degenerate String Matching

Authors: Kotaro Aoyama, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm sqrt{m log m} + N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm^2 + N) time.

Cite as

Kotaro Aoyama, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Online Elastic Degenerate String Matching. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 9:1-9:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{aoyama_et_al:LIPIcs.CPM.2018.9,
  author =	{Aoyama, Kotaro and Nakashima, Yuto and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Faster Online Elastic Degenerate String Matching}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{9:1--9:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.9},
  URN =		{urn:nbn:de:0030-drops-87016},
  doi =		{10.4230/LIPIcs.CPM.2018.9},
  annote =	{Keywords: elastic degenerate pattern matching, boolean convolution}
}
Document
Longest substring palindrome after edit

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
It is known that the length of the longest substring palindromes (LSPals) of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(log (min {sigma, log n })) time after single character substitution, insertion, or deletion, where sigma denotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(l + log n) time, after an existing substring in T is replaced by a string of arbitrary length l.

Cite as

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Longest substring palindrome after edit. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 12:1-12:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.CPM.2018.12,
  author =	{Funakoshi, Mitsuru and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Longest substring palindrome after edit}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{12:1--12:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.12},
  URN =		{urn:nbn:de:0030-drops-86977},
  doi =		{10.4230/LIPIcs.CPM.2018.12},
  annote =	{Keywords: maximal palindromes, edit operations, periodicity, suffix trees}
}
Document
Computing longest common square subsequences

Authors: Takafumi Inoue, Shunsuke Inenaga, Heikki Hyyrö, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings.

Cite as

Takafumi Inoue, Shunsuke Inenaga, Heikki Hyyrö, Hideo Bannai, and Masayuki Takeda. Computing longest common square subsequences. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{inoue_et_al:LIPIcs.CPM.2018.15,
  author =	{Inoue, Takafumi and Inenaga, Shunsuke and Hyyr\"{o}, Heikki and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Computing longest common square subsequences}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{15:1--15:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.15},
  URN =		{urn:nbn:de:0030-drops-86946},
  doi =		{10.4230/LIPIcs.CPM.2018.15},
  annote =	{Keywords: squares, subsequences, matching rectangles, dynamic programming}
}
Document
Longest Lyndon Substring After Edit

Authors: Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T') where T' is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(l log sigma + log n) time for any block edit where sigma is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T' for both problems.

Cite as

Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Longest Lyndon Substring After Edit. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 19:1-19:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{urabe_et_al:LIPIcs.CPM.2018.19,
  author =	{Urabe, Yuki and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Longest Lyndon Substring After Edit}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{19:1--19:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.19},
  URN =		{urn:nbn:de:0030-drops-86913},
  doi =		{10.4230/LIPIcs.CPM.2018.19},
  annote =	{Keywords: Lyndon word, Lyndon factorization, Lyndon tree, Edit operation}
}
Document
Lyndon Factorization of Grammar Compressed Texts Revisited

Authors: Isamu Furuya, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.

Cite as

Isamu Furuya, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Lyndon Factorization of Grammar Compressed Texts Revisited. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 24:1-24:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{furuya_et_al:LIPIcs.CPM.2018.24,
  author =	{Furuya, Isamu and Nakashima, Yuto and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Lyndon Factorization of Grammar Compressed Texts Revisited}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{24:1--24:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.24},
  URN =		{urn:nbn:de:0030-drops-86855},
  doi =		{10.4230/LIPIcs.CPM.2018.24},
  annote =	{Keywords: Lyndon word, Lyndon factorization, Straight line program}
}
Document
Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings

Authors: Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 92, 28th International Symposium on Algorithms and Computation (ISAAC 2017)


Abstract
We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding. Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2. We also show an algorithm for computing all maximal repetitions in O(m \alpha(m)) time and O(m) space, where \alpha denotes the inverse Ackermann function.

Cite as

Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 33:1-33:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{fujishige_et_al:LIPIcs.ISAAC.2017.33,
  author =	{Fujishige, Yuta and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings}},
  booktitle =	{28th International Symposium on Algorithms and Computation (ISAAC 2017)},
  pages =	{33:1--33:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-054-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{92},
  editor =	{Okamoto, Yoshio and Tokuyama, Takeshi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2017.33},
  URN =		{urn:nbn:de:0030-drops-82610},
  doi =		{10.4230/LIPIcs.ISAAC.2017.33},
  annote =	{Keywords: maximal repetitions,run length encoding}
}
Document
Small-Space LCE Data Structure with Constant-Time Queries

Authors: Yuka Tanimura, Takaaki Nishimoto, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda

Published in: LIPIcs, Volume 83, 42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017)


Abstract
The longest common extension (LCE) problem is to preprocess a given string w of length n so that the length of the longest common prefix between suffixes of w that start at any two given positions is answered quickly. In this paper, we present a data structure of O(z \tau^2 + \frac{n}{\tau}) words of space which answers LCE queries in O(1) time and can be built in O(n \log \sigma) time, where 1 \leq \tau \leq \sqrt{n} is a parameter, z is the size of the Lempel-Ziv 77 factorization of w and \sigma is the alphabet size. The proposed LCE data structure not access the input string w when answering queries, and thus w can be deleted after preprocessing. On top of this main result, we obtain further results using (variants of) our LCE data structure, which include the following: - For highly repetitive strings where the z\tau^2 term is dominated by \frac{n}{\tau}, we obtain a constant-time and sub-linear space LCE query data structure. - Even when the input string is not well compressible via Lempel-Ziv 77 factorization, we still can obtain a constant-time and sub-linear space LCE data structure for suitable \tau and for \sigma \leq 2^{o(\log n)}. - The time-space trade-off lower bounds for the LCE problem by Bille et al. [J. Discrete Algorithms, 25:42-50, 2014] and by Kosolobov [CoRR, abs/1611.02891, 2016] do not apply in some cases with our LCE data structure.

Cite as

Yuka Tanimura, Takaaki Nishimoto, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Small-Space LCE Data Structure with Constant-Time Queries. In 42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 83, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{tanimura_et_al:LIPIcs.MFCS.2017.10,
  author =	{Tanimura, Yuka and Nishimoto, Takaaki and Bannai, Hideo and Inenaga, Shunsuke and Takeda, Masayuki},
  title =	{{Small-Space LCE Data Structure with Constant-Time Queries}},
  booktitle =	{42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017)},
  pages =	{10:1--10:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-046-0},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{83},
  editor =	{Larsen, Kim G. and Bodlaender, Hans L. and Raskin, Jean-Francois},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2017.10},
  URN =		{urn:nbn:de:0030-drops-81021},
  doi =		{10.4230/LIPIcs.MFCS.2017.10},
  annote =	{Keywords: longest common extension, truncated suffix trees, t-covers}
}
Document
Faster STR-IC-LCS Computation via RLE

Authors: Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)


Abstract
The constrained LCS problem asks one to find a longest common subsequence of two input strings A and B with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string C as a substring. Given two strings A and B of respective lengths M and N, and a constraint string C of length at most min{M, N}, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz (Inf. Process. Lett., 11:423-426, 2012), runs in O(MN) time. In this work, we present an O(mN + nM)-time solution to the STR-IC-LCS problem, where m and n denote the sizes of the run-length encodings of A and B, respectively. Since m <= M and n <= N always hold, our algorithm is always as fast as Deorowicz's algorithm, and is faster when input strings are compressible via RLE.

Cite as

Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster STR-IC-LCS Computation via RLE. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 20:1-20:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{kuboi_et_al:LIPIcs.CPM.2017.20,
  author =	{Kuboi, Keita and Fujishige, Yuta and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Faster STR-IC-LCS Computation via RLE}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{20:1--20:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.20},
  URN =		{urn:nbn:de:0030-drops-73335},
  doi =		{10.4230/LIPIcs.CPM.2017.20},
  annote =	{Keywords: longest common subsequence, STR-IC-LCS, run-length encoding}
}
Document
Computing All Distinct Squares in Linear Time for Integer Alphabets

Authors: Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)


Abstract
Given a string on an integer alphabet, we present an algorithm that computes the set of all distinct squares belonging to this string in time linear to the string length. As an application, we show how to compute the tree topology of the minimal augmented suffix tree in linear time. Asides from that, we elaborate an algorithm computing the longest previous table in a succinct representation using compressed working space.

Cite as

Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl. Computing All Distinct Squares in Linear Time for Integer Alphabets. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 22:1-22:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2017.22,
  author =	{Bannai, Hideo and Inenaga, Shunsuke and K\"{o}ppl, Dominik},
  title =	{{Computing All Distinct Squares in Linear Time for Integer Alphabets}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{22:1--22:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.22},
  URN =		{urn:nbn:de:0030-drops-73224},
  doi =		{10.4230/LIPIcs.CPM.2017.22},
  annote =	{Keywords: tandem repeats, distinct squares, counting algorithms}
}
Document
Tight Bounds on the Maximum Number of Shortest Unique Substrings

Authors: Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)


Abstract
A substring Q of a string S is called a shortest unique substring (SUS) for interval [s,t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s,t], and every substring of S which contains interval [s,t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUSs for interval [s,t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s <= t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.

Cite as

Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Tight Bounds on the Maximum Number of Shortest Unique Substrings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 24:1-24:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.CPM.2017.24,
  author =	{Mieno, Takuya and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Tight Bounds on the Maximum Number of Shortest Unique Substrings}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{24:1--24:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.24},
  URN =		{urn:nbn:de:0030-drops-73460},
  doi =		{10.4230/LIPIcs.CPM.2017.24},
  annote =	{Keywords: shortest unique substrings, maximal unique substrings}
}
Document
Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets

Authors: Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 58, 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)


Abstract
The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words of y can be computed in optimal O(n + |MAW(y)|) time and O(n) working space for integer alphabets.

Cite as

Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 38:1-38:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{fujishige_et_al:LIPIcs.MFCS.2016.38,
  author =	{Fujishige, Yuta and Tsujimaru, Yuki and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets}},
  booktitle =	{41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)},
  pages =	{38:1--38:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-016-3},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{58},
  editor =	{Faliszewski, Piotr and Muscholl, Anca and Niedermeier, Rolf},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2016.38},
  URN =		{urn:nbn:de:0030-drops-64528},
  doi =		{10.4230/LIPIcs.MFCS.2016.38},
  annote =	{Keywords: string algorithms, DAWGs, suffix trees, minimal absent words}
}
Document
Shortest Unique Substring Queries on Run-Length Encoded Strings

Authors: Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 58, 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)


Abstract
We consider the problem of answering shortest unique substring (SUS) queries on run-length encoded strings. For a string S, a unique substring u = S[i..j] is said to be a shortest unique substring (SUS) of S containing an interval [s, t] (i <= s <= t <= j) if for any i' <= s <= t <= j' with j-i > j'-i', S[i'..j'] occurs at least twice in S. Given a run-length encoding of size m of a string of length N, we show that we can construct a data structure of size O(m+pi_s(N, m)) in O(m log m + pi_c(N, m)) time such that queries can be answered in O(pi_q(N, m) + k) time, where k is the size of the output (the number of SUSs), and pi_s(N,m), pi_c(N,m), pi_q(N,m) are, respectively, the size, construction time, and query time for a predecessor/successor query data structure of m elements for the universe of [1,N]. Using the data structure by Beam and Fich (JCSS 2002), this results in a data structure of O(m) space that is constructed in O(m log m) time, and answers queries in O(sqrt(log m/loglog m)+k) time.

Cite as

Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Shortest Unique Substring Queries on Run-Length Encoded Strings. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 69:1-69:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.MFCS.2016.69,
  author =	{Mieno, Takuya and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Shortest Unique Substring Queries on Run-Length Encoded Strings}},
  booktitle =	{41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)},
  pages =	{69:1--69:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-016-3},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{58},
  editor =	{Faliszewski, Piotr and Muscholl, Anca and Niedermeier, Rolf},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2016.69},
  URN =		{urn:nbn:de:0030-drops-65033},
  doi =		{10.4230/LIPIcs.MFCS.2016.69},
  annote =	{Keywords: string algorithms, shortest unique substring, run-length encoding}
}
Document
Fully Dynamic Data Structure for LCE Queries in Compressed Space

Authors: Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 58, 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)


Abstract
A Longest Common Extension (LCE) query on a text T of length N asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding G of size w = O(min(z log N log^* M, N)) [Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of T, which can be seen as a compressed representation of T, has a capability to support LCE queries in O(log N + log ell log^* M) time, where ell is the answer to the query, z is the size of the Lempel-Ziv77 (LZ77) factorization of T, and M >= 4N is an integer that can be handled in constant time under word RAM model. In compressed space, this is the fastest deterministic LCE data structure in many cases. Moreover, G can be enhanced to support efficient update operations: After processing G in O(w f_A) time, we can insert/delete any (sub)string of length y into/from an arbitrary position of T in O((y + log Nlog^* M) f_A) time, where f_A = O(min{ (loglog M loglog w)/(logloglog M), sqrt(log w/loglog w)}). This yields the first fully dynamic LCE data structure working in compressed space. We also present efficient construction algorithms from various types of inputs: We can construct G in O(N f_A) time from uncompressed string T; in O(n loglog (n log^* M) log N log^* M) time from grammar-compressed string T represented by a straight-line program of size n; and in O(z f_A log N log^* M) time from LZ77-compressed string T with z factors. On top of the above contributions, we show several applications of our data structures which improve previous best known results on grammar-compressed string processing.

Cite as

Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully Dynamic Data Structure for LCE Queries in Compressed Space. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 72:1-72:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{nishimoto_et_al:LIPIcs.MFCS.2016.72,
  author =	{Nishimoto, Takaaki and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Fully Dynamic Data Structure for LCE Queries in Compressed Space}},
  booktitle =	{41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)},
  pages =	{72:1--72:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-016-3},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{58},
  editor =	{Faliszewski, Piotr and Muscholl, Anca and Niedermeier, Rolf},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2016.72},
  URN =		{urn:nbn:de:0030-drops-65045},
  doi =		{10.4230/LIPIcs.MFCS.2016.72},
  annote =	{Keywords: dynamic texts, longest common extension (LCE) queries, straight-line program}
}
Document
Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction

Authors: Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)


Abstract
Given a string S of n symbols, a longest common extension query LCE(i,j) asks for the length of the longest common prefix of the $i$th and $j$th suffixes of S. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42-50, 2014, Proc. CPM 2015:65-76) described several data structures for answering LCE queries that offers a space-time trade-off between data structure size and query time. In particular, for a parameter 1 <= tau <= n, their best deterministic solution is a data structure of size O(n/tau) which allows LCE queries to be answered in O(tau) time. However, the construction time for all deterministic versions of their data structure is quadratic in n. In this paper, we propose a deterministic solution that achieves a similar space-time trade-off of O(tau * min(log(tau),log(n/tau)) query time using O(n/tau) space, but significantly improve the construction time to O(n*tau).

Cite as

Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda. Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 1:1-1:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{tanimura_et_al:LIPIcs.CPM.2016.1,
  author =	{Tanimura, Yuka and I, Tomohiro and Bannai, Hideo and Inenaga, Shunsuke and Puglisi, Simon J. and Takeda, Masayuki},
  title =	{{Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{1:1--1:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.1},
  URN =		{urn:nbn:de:0030-drops-60655},
  doi =		{10.4230/LIPIcs.CPM.2016.1},
  annote =	{Keywords: longest common extension, longest common prefix, sparse suffix array}
}
Document
Fully-online Construction of Suffix Trees for Multiple Texts

Authors: Takuya Takagi, Shunsuke Inenaga, and Hiroki Arimura

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)


Abstract
We consider fully-online construction of indexing data structures for multiple texts. Let T = {T_1, ..., T_K} be a collection of texts. By fully-online, we mean that a new character can be appended to any text in T at any time. This is a natural generalization of semi-online construction of indexing data structures for multiple texts in which, after a new character is appended to the kth text T_k, then its previous texts T_1, ..., T_k-1 will remain static. Our fully-online scenario arises when we maintain dynamic indexes for multi-sensor data. Let N and sigma denote the total length of texts in T and the alphabet size, respectively. We first show that the algorithm by Blumer et al. [Theoretical Computer Science, 40:31-55, 1985] to construct the directed acyclic word graph (DAWG) for T can readily be extended to our fully-online setting, retaining O(N log sigma)-time and O(N)-space complexities. Then, we give a sophisticated fully-online algorithm which constructs the suffix tree for T in O(N log sigma) time and O(N) space. A key idea of this algorithm is synchronized maintenance of the DAWG and the suffix tree.

Cite as

Takuya Takagi, Shunsuke Inenaga, and Hiroki Arimura. Fully-online Construction of Suffix Trees for Multiple Texts. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 22:1-22:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{takagi_et_al:LIPIcs.CPM.2016.22,
  author =	{Takagi, Takuya and Inenaga, Shunsuke and Arimura, Hiroki},
  title =	{{Fully-online Construction of Suffix Trees for Multiple Texts}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{22:1--22:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.22},
  URN =		{urn:nbn:de:0030-drops-60719},
  doi =		{10.4230/LIPIcs.CPM.2016.22},
  annote =	{Keywords: suffix trees, DAWGs, multiple texts, online algorithms}
}
Document
Factorizing a String into Squares in Linear Time

Authors: Yoshiaki Matsuoka, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Florin Manea

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)


Abstract
A square factorization of a string w is a factorization of w in which each factor is a square. Dumitran et al. [SPIRE 2015, pp. 54-66] showed how to find a square factorization of a given string of length n in O(n log n) time, and they posed a question whether it can be done in O(n) time. In this paper, we answer their question positively, showing an O(n)-time algorithm for square factorization in the standard word RAM model with machine word size omega = Omega(log n). We also show an O(n + (n log^2 n) / omega)-time (respectively, O(n log n)-time) algorithm to find a square factorization which contains the maximum (respectively, minimum) number of squares.

Cite as

Yoshiaki Matsuoka, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Florin Manea. Factorizing a String into Squares in Linear Time. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 27:1-27:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{matsuoka_et_al:LIPIcs.CPM.2016.27,
  author =	{Matsuoka, Yoshiaki and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki and Manea, Florin},
  title =	{{Factorizing a String into Squares in Linear Time}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{27:1--27:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.27},
  URN =		{urn:nbn:de:0030-drops-60645},
  doi =		{10.4230/LIPIcs.CPM.2016.27},
  annote =	{Keywords: Squares, Runs, Factorization of Strings}
}
Document
Efficiently Finding All Maximal alpha-gapped Repeats

Authors: Pawel Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea

Published in: LIPIcs, Volume 47, 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)


Abstract
For alpha >=1, an alpha-gapped repeat in a word w is a factor uvu of w such that |uv| <= alpha * |u|; the two occurrences of a factor u in such a repeat are called arms. Such a repeat is called maximal if its arms cannot be extended simultaneously with the same symbol to the right nor to the left. We show that the number of all maximal alpha-gapped repeats occurring in words of length n is upper bounded by 18 * alpha * n, allowing us to construct an algorithm finding all maximal alpha-gapped repeats of a word on an integer alphabet of size n^{O}(1)} in {O}(alpha * n) time. This result is optimal as there are words that have Theta(alpha * n) maximal alpha-gapped repeats. Our techniques can be extended to get comparable results in the case of alpha-gapped palindromes, i.e., factors uvu^{T} with |uv| <= alpha |u|.

Cite as

Pawel Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea. Efficiently Finding All Maximal alpha-gapped Repeats. In 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 47, pp. 39:1-39:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2016.39,
  author =	{Gawrychowski, Pawel and I, Tomohiro and Inenaga, Shunsuke and K\"{o}ppl, Dominik and Manea, Florin},
  title =	{{Efficiently Finding All Maximal alpha-gapped Repeats}},
  booktitle =	{33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)},
  pages =	{39:1--39:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-001-9},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{47},
  editor =	{Ollinger, Nicolas and Vollmer, Heribert},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2016.39},
  URN =		{urn:nbn:de:0030-drops-57408},
  doi =		{10.4230/LIPIcs.STACS.2016.39},
  annote =	{Keywords: combinatorics on words, counting algorithms}
}
Document
Faster Compact On-Line Lempel-Ziv Factorization

Authors: Jun'ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda

Published in: LIPIcs, Volume 25, 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)


Abstract
We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in O(N.log(N)) time and uses only O(N.log(s)) bits of working space, where N is the length of the string and s is the size of the alphabet. This is a notable improvement compared to the performance of previous on-line algorithms using the same order of working space but running in either O(N.log^3(N)) time [Okanohara and Sadakane, 2009] or O(N.log^2(N)) time [Starikovskaya, 2012]. The key to our new algorithm is in the utilization of an elegant but less popular index structure called Directed Acyclic Word Graphs, or DAWGs [Blumer et al., 1985]. We also present an opportunistic variant of our algorithm, which, given the run length encoding of size m of a string of length N, computes the Lempel-Ziv factorization of the string on-line, in O(m.min{log(log(m)).log(log(N))/(log(log(log(N)))), (log(m))^{1/2}/(log(log(m)))^{1/2})}) time and O(m.log(N)) bits of space.

Cite as

Jun'ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Faster Compact On-Line Lempel-Ziv Factorization. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 25, pp. 675-686, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{yamamoto_et_al:LIPIcs.STACS.2014.675,
  author =	{Yamamoto, Jun'ichi and I, Tomohiro and Bannai, Hideo and Inenaga, Shunsuke and Takeda, Masayuki},
  title =	{{Faster Compact On-Line Lempel-Ziv Factorization}},
  booktitle =	{31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)},
  pages =	{675--686},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-65-1},
  ISSN =	{1868-8969},
  year =	{2014},
  volume =	{25},
  editor =	{Mayr, Ernst W. and Portier, Natacha},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2014.675},
  URN =		{urn:nbn:de:0030-drops-44976},
  doi =		{10.4230/LIPIcs.STACS.2014.675},
  annote =	{Keywords: Lempel-Ziv Factorization, String Index}
}
Document
An Efficient Algorithm to Test Square-Freeness of Strings Compressed by Balanced Straight Line Program

Authors: Wataru Matsubara, Shunsuke Inenaga, and Ayumi Shinohara

Published in: Dagstuhl Seminar Proceedings, Volume 8261, Structure-Based Compression of Complex Massive Data (2008)


Abstract
In this paper we study the problem of deciding whether a given compressed string contains a square. A string x is called a square if x = zz and z = u^k implies k = 1 and u = z. A string w is said to be square-free if no substrings of w are squares. Many efficient algorithms to test if a given string is square-free, have been developed so far. However, very little is known for testing square-freeness of a given compressed string. In this paper, we give an O(max(n^2; n log^2 N))-time O(n^2)-space solution to test square-freeness of a given compressed string, where n and N are the size of a given compressed string and the corresponding decompressed string, respectively. Our input strings are compressed by balanced straight line program (BSLP). We remark that BSLP has exponential compression, that is, N = O(2^n). Hence no decompress-then-test approaches can be better than our method in the worst case.

Cite as

Wataru Matsubara, Shunsuke Inenaga, and Ayumi Shinohara. An Efficient Algorithm to Test Square-Freeness of Strings Compressed by Balanced Straight Line Program. In Structure-Based Compression of Complex Massive Data. Dagstuhl Seminar Proceedings, Volume 8261, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{matsubara_et_al:DagSemProc.08261.5,
  author =	{Matsubara, Wataru and Inenaga, Shunsuke and Shinohara, Ayumi},
  title =	{{An Efficient Algorithm to Test Square-Freeness of Strings Compressed by Balanced Straight Line Program}},
  booktitle =	{Structure-Based Compression of Complex Massive Data},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8261},
  editor =	{Stefan B\"{o}ttcher and Markus Lohrey and Sebastian Maneth and Wojcieh Rytter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08261.5},
  URN =		{urn:nbn:de:0030-drops-16804},
  doi =		{10.4230/DagSemProc.08261.5},
  annote =	{Keywords: Square Freeness, Straight Line Program}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail